NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Usability Analysis and Consequences of Testing Exploration of the Problem-Solving Measures–Computer-Adaptive Test

https://doi.org/10.3390/educsci15060680

King, Sophie Grace; Bostic, Jonathan David; May, Toni A; Stone, Gregory E (June 2025, Education Sciences)

Testing is a part of education around the world; however, there are concerns that consequences of testing is underexplored within current educational scholarship. Moreover, usability studies are rare within education. One aim of the present study was to explore the usability of a mathematics problem-solving test called the Problem Solving Measures–Computer-Adaptive Test (PSM-CAT) designed for grades six to eight students (ages 11–14). The second aim of this mixed-methods research was to unpack consequences of testing validity evidence related to the results and test interpretations, leveraging the voices of participants. A purposeful, representative sample of over 1000 students from rural, suburban, and urban districts across the USA were administered PSM-CAT followed by a survey. Approximately 100 of those students were interviewed following test administration. Findings indicated that (1) participants engaged in the PSM-CAT as desired and found it highly usable (e.g., most respondents were able to use and find the calculator and several students commented that they engaged with the test as desired) and (2) the benefits from testing largely outweighed any negative outcomes (e.g., 92% of students interviewed had positive attitudes towards the testing experiences), which in turn supports consequences from testing validity evidence for PSM-CAT. This study provides an example of a usability study for educational testing and builds upon previous calls for greater consequences of testing research.
more » « less
Free, publicly-accessible full text available June 1, 2026
Investigating Differences in Assessment Delivery Formats: An Illustration Study

May, Toni A; Stone, Gregory E; Bostic, Jonathan D; Folger, Timothy D; Sondergeld, Connor J (March 2025, Proceedings for the 52nd Annual Meeting of the Research Council on Mathematics Learning)

This study explored how mathematics problem-solving constructed-response tests compared in terms of item psychometrics when administered to eighth grade students in two different static formats: paper-pencil and computer-based. Quantitative results indicated similarly across all psychometric indices for the overall tests and at the item-level.
more » « less
Free, publicly-accessible full text available March 8, 2026
Examining how using dichotomous and partial credit scoring models influence sixth‐grade mathematical problem‐solving assessment outcomes

https://doi.org/10.1111/ssm.12570

May, Toni A.; Koskey, Kristin L. K.; Bostic, Jonathan D.; Stone, Gregory E.; Kruse, Lance M.; Matney, Gabriel (February 2023, School Science and Mathematics)

Abstract Determining the most appropriate method of scoring an assessment is based on multiple factors, including the intended use of results, the assessment's purpose, and time constraints. Both the dichotomous and partial credit models have their advantages, yet direct comparisons of assessment outcomes from each method are not typical with constructed response items. The present study compared the impact of both scoring methods on the internal structure and consequential validity of a middle‐grades problem‐solving assessment called the problem solving measure for grade six (PSM6). After being scored both ways, Rasch dichotomous and partial credit analyses indicated similarly strong psychometric findings across models. Student outcome measures on the PSM6, scored both dichotomously and with partial credit, demonstrated strong, positive, significant correlation. Similar demographic patterns were noted regardless of scoring method. Both scoring methods produced similar results, suggesting that either would be appropriate to use with the PSM6.
more » « less

Search for: All records